FROM :• LAW OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12: 18PM P10 



HEWLETT-PACKARD COMPANY 
Intellectual Property Administration 

P.O.Box2W40fl 

Fort Collins, Colorado 80527-2400 

IN THE 

UNITED STATES PATENT AND TRADEMARK OFFICE 

Inventor(s): Jun Gao et al. Confirmation No.: 5995 

Application No.: 10/050,346 Examiner: Yaritza Guadalupe 

Filing Date: January 15, 2002 Group Art Unit: 2859 
Title: CLUSTER-WEIGHTED MODELING FOR MEDIA CLASSIFICATION 



COMMISSIONER FOR PATENTS 
P.O. Box 1450 
Alexandria, VA 22313-1450 

DECLARATION OF JUN GAP UNDER 37 CFR 1,132 

Sir: 

1 . My name is Jun Gao. I am an employee of Hewlett-Packard 
Company. 

2. I am a co-inventor of the invention claimed in U.S. Patent 
Application Serial No. 10/050,346, which is entitled "CLUSTER-WEIGHTED 
MODELING FOR MEDIA CLASSIFICATION." The other co-inventor of this 
application is Ross R. Allen. 

3. I am also a co-inventor of the invention claimed in U.S. Pat. No. 
6,517,180, which is entitled "DOT SENSING, COLOR SENSING AND 
MEDIA SENSING BY A PRINTER FOR QUALITY CONTROL." The other 
co-inventors of this issued patent are Ross R. Allen, Barclay J. Tullis and 
Carl E. Picciotti. 

4. It is standard practice at Hewlett-Packard Company to submit an 
"Invention Disclosure* to the company's legal department when potentially 
patentable inventions are discovered as a consequence of work performed 
for the company. Accompanying this Declaration is such an Invention 



PATENT APPLICATION 
Attorney DocU Mo. <flft4264U 



PAGE 10/26 ' RCVDAT 11/18/20033:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRM/1 * DNtS:8729319 • CSID:650 969 6216 * DURATION (mm-$s);16-32 



FROM :• LPU OFFICES OF TERRY McHUGH PHONE NO. : 650 969 6216 No^. 18 2003 12: 18PM Pll 

Application No. 10/050.346 

2- 

Disclosure. The accompanying Invention Disclosure is labeled Exhibit A and 
consists of three pages in which entries are made into a standardized form 
and eleven pages of an explanation of cluster-weighted modeling (CWM) and 
its application to classifying print media. 

5. The invention that is the focus of the Invention Disclosure of 
Exhibit A is also the focus of the patent application identified in Paragraph 2 
of this Declaration. 

6. On the first page of Exhibit A, the Invention Disclosure is signed 
by Ross R. Allen and myself as inventors. There are no other inventors of the 
disclosed invention. 

7. On the second page of Exhibit A, Carl Picciotti, who is a 
co-inventor of the patent identified in paragraph 3 of this Declaration, is 
identified as a witness to whom the invention was explained. The "Date of 
Signature" is February 7, 2001 . 

8. Each page of the eleven page document that is part of Exhibit A 
is signed and dated by Ross R. Allen and myself as inventors. The two 
signatures are dated January 24, 2001 . 

9. Each page of the eleven page document that is part of Exhibit A is 
also signed and dated by Carl Picciotti and Raymond Beausoleil. While not 
apparent from the document, these signatures were witness signatures. The 
date of the Carl Picciotti signature is February 7, 2001 , the same date as that 
on the second page of Exhibit A for the witness signature of Carl Picciotti. 

10. That portion of the issued patent (identified in Paragraph 3 of 
this Declaration) that describes using information of textural features of a print 
media as Input parameters for a probabilistic input-output model in order to 
classify the print media is derived from the work of Ross R. Allen and myself 



Attnrrww HrtrfcAt Nn 1fWI!>R<l1-1 DECLARATION OF JUN QAO UNDER 37 CFR 1.132 

PAGE 1 1/26 * RCVD AT 11118/2003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 * DNIS;87293t9 * CSID;650 969 6216 * DURATION (mm-ss):16-32 



FROM LAW OFFICES OF TERRY McHUGH 



PHONE NO. 



650 969 6216 



Nov. 18 2003 12: 18PM pi2 



Application No. 10/050,346 



-3- 



and represents the invention described in the Invention Disclosure of 
Exhibit A. 

11. I hereby declare that all statements made herein of my own 
knowledge are true and that all statements made on information and belief 
are believed to be true and further that the statements are made with the 
knowledge that willful false statements and the like so made are punishable 
by fine or imprisonment, or both, under Section 1001 of Title 18 of the United 
States Code and that such willful false statements may jeopardize the validity 
of the application or any patent issued thereon. 



PAGE 12/26 ' RCVD AT 11118/2003 3:10:10 PM [Eastern Standard Time] 1 SVR:USPT0-EFXRM/1 * DNIS:8729319 * CSID:650 969 6216 ■ DURATION (mm-ss):16-32 



Date: November 18, 2003 




Jun Gao 



FROM LAW OFriCES OF TERRY McHUGH . PHONE NO. : 650 969 6216 



Nov. 18 2003 12: 19PM p 13 



Wr ite in Dark Ink on Front Side Onl y, Please 



INVENTION DISCLOSURE 



DATE RCVD 



PAGE ONE OF _ 

ATTORNEY 



tos&vcfims: The infwmtiow contained m tins document is COMPANY CQH&DENTIAV, mi may not be disclosed lo others wmout pnor 
wthorization. Submit m di$cte>surt to the HP Legal Department as soon es possible. No pate/tf profed/or) poss/#e un$ a pafenf $ppfio#fon m 
s^rimo^ prepared, m d subm itted to t he Govwrimeftf. „ „ , _ 



Descriptive 7*tt*of tarartion: 



Method to Identify Print Media in a Printer 



feme of project: 



HPL Labs Har dcopy Sensors 



Product Name or Number: 



Was a description of the invention published or are you planning to publish? If so, me datefcs) and p^eatipn(s): 

Was a pmti&jGt including the irrvenSon artTwuncad, offered for sale, sola or Is such.atoKvity proposed? if so, the date(s} and locations): 
M 

Was.tne invention cfectosed to anyone outside, of HP, or will sucn disclosure occur? if so, the4ate[s) and name(s); 
NO 



Was the invention described in a lab book or other record? If so, please identify {lab book^ ela) 
No. Seeattached document. 



Was Re invention built or tested? tf so, the date: 
Yes-JuJy2O0a 



Vtes this invention made under a govamment cortotf? It so. ihe agency and contract numbs* 
Ko 



Des^jrtwnof tovetitiaa: Pfees* preserve alt records of the invention and ^c^ddmon^pa^.f^tt^iofhwirt^ Bach edtfitawtf page should 
tetiQwdanddahritythelnw 

K Description olJtecoRtffrH$on and operation of the invention {indude appropriate stferoafec; bfock, & timing tflegrams; drawings, sampfes; 

graphs; . flewcharte; computer listings; test results; etc.) 
a. Advantages of the invention over what has been done before. 
C. Problems solved by the inveofort 

0, Pnor^lutlon^gnd -tbeif disoAiantages fif fltfailabto. attecn copies of produgtlaraUjre, technical azotes, patents; efo), 



%n^^<tfIrtye^rfor^): Pursuant to rrwfcur) errdoviTiert agre^ date: ( C^fHl3} ] 



0 0 m?5^ JDK GAP 



EmpfeyesNo. Name 



Tetaet Maiisiop Entity &tab Nanrt^ 



Employee Wo. Name 



Signature 



Telnet Mailstop 



Entity & Lab hferne 



Employee No, Name 



Signature 



Telnet Mailstop 



Entity & Lab Name 



Driployee No, Name 

(tf more than kur mentors f 



Signature Telnet MaHstop En%& Lab Name 



Form 3.1 idtaoo, rev. 08/03/00 



PAGE 13/26 * RCVD AT 11/18/2003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 ^ DNIS:8729319^ CS!D:650 969 6216* DURAT^^I^l'-^ I T A 



FROM r LAU OFFICES OF TERRY McHUGH PHONE NO. : 650 969 6216 



Nov. 18 2003 12:20PM P14 



Write in Dark Ink on From Side Only, Please 



INVENTION DISCLOSURE company confidential 




SgfUltlire Of Wltn^S${eS): (Ptoose iry to Dbtei fa Wfliwfcre af ih» peaces) te vhom bitefiGan was first cfocfosw* ) 

Ttie awention was ftstexfclained to, atri understood by, me (us) on this date: [ 


_ i 


( ^<5\<\ fi^UCftto ... ^'Z^JLL _ 


pale of Signature 

2/ 5/? oof 






.. IwOTtor ft Home Address Information: ^<m/nwtoAw«n^ lini^sadtf.^ 


Invents Fdi Ns^e ^ 


467 WHISHT PMiC "fctfi/e 


Ciiy 








Grefcteditt {r>jftrt^./JW^ /TWTJft efc J J^J ^ Clanship 





K^e><5» K tee. Wi-L-z^ 






Stele 


Zip 




have a Fteadsnl^.O. A<k±^ P.O BOX Cifr ^ _^ > 


Stsie 


Z» 




•Gristed ai {hefoam ^3»te GtfoHisfifr 






— », — — 


Renter's Pull Nanw 








Z'P 




Doyou Raw ft«e?tde^>a) P.O. Attrfrees? PO BOX / 5£ 


Stete 


»p 




Grated as toefcrai-as, n*t», efc) Ctorighp 


1*: tiKttis^ Fuli Warns / 




t*y 7 




2ip 




Doyta/nwse Reswtarjial P.O. /ufcte^? P.O BOX Gifr 


State 


Zip 




Greyed ■oB-taktotm. tnkk£<3 name, > Citeenshp 



Foni)3.1 lc».<toc,rev:.0€J03/00 



PAGE 14/26 * RCVD AT 11/18/2003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 * DN1S:8729319 * CSID:650 969 6216 * DURATION (mm-ss):16-32 



FROM % . LAW OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2903 12:20PM P15 



Write in Dark, ink on Front Side Only, Please 

Description of Invention: piSSsei^sem STScwtS of the inimfonand sti^. additional pages for the fot!wing. Each <3<tdf8ati$tpag$ shodd 

b epfjned end daje rf bytha invent ory 3ml witr&$$ {$$}_. m _ , 

A. Desertion of fr$ construction and operation of the invention (include appropriate schematic, tiocK & timing digrams: drawings; sanies; 
graphs: flowcharts; computer listings: test faults: etc.) 

See attached document 



B: Advarrfage&of the invention over tohat has been done before. 

Ths presantinvsntion provides a highty^abje, simple, low-cost and easily ernfcedd^ble rnethod for identification of the print metfum in a 
printer* li^imifss By microscopic (oraBr-cf e it^roa pixel) Ima^lno of the sttffaoe texturp csf Ite print m«KSuniv This has faesn proven to be a 
r^iH^tt^odoftfstinguishtfjg beiween different types of papers and soma overhead tifttetfrancy films. 

(^ter- methods developed for HP DeskJet printers cannot provide the accurate dscnmlnalion between similar media types from lower resolution 
measurements of diffuse and spectiar nsflsctloa 

C. Problems sotved by the invenflon - ~ 

HP studies have shown that few users correctly est printer driver dialog box.vahies for thereof paper in the printer. Many never even.mate 
trrtssefectioa The r^Usifcattoe print 

number of drops of ink per pixel, niBnber of passes, csior maps, etc.) re not matched to the papar The result is interior prmt Ttris can 
leadtosigr^rtua^ 

m invention provides a highly rQiiabte method to match printer $e«ings to tho print medium resulting in optimal print quality. 

0. Prior, soiufons andthSr dfsactaantages ftf ^laWe. att^ copies of product Irteratum. tscMcaL articles, patents, etc.). 

Some new rnodeteof HP DeskJets (i.e., 990C and PhotoSmart 1220) have a.sinpjemed^ type.deiecaor that uses dftuse and.speoiar 
i^fe^iqn maasuied ^ «bex4t 1 @0O <^ {a .40arn~fAefil .^z<aj[ to cfiromhto-brt^:^a types. This method is simpi8:and reasonably 
inexpensive but net as accurate as required in dscnmtoatmg among ctfferertpapers. 



form 3:1 idf.<toc,.w. 08/03/00 



PAGE 15/26 * RCVD AT 11/18/2003 3:10:10 PM [Eastern Standard Time] 1 SVR:USPT0-EFXRF-1/1 * DNIS:8729319 * CSID:650 969 6216 * DURATION (mm-ss):16-32 



FROM :■ LRU) OFFICES OF TERRY McHUGH PHONE NO. : 650 969 6216 



Nov. 18 2003 12:21PM P16 



CWM with Application to Media Cfassiffcation 
1. Introduction to Cluster-Weighted Modeling 



Cluster-Weighted Modeling is an input/output inference framework based on probability density 
estimation of a joint set of features and target date. It is similar to mixture-of-experts type 
architectures and cart be interpreted as a flexible and transparent technique to approximate an 
arbitrary function 1 . During training* clusters automatically "go to where data is" and approximate 
the subsets of the data space according to a smooth domain of influence. Globally, the influence 
of different clusters is weighted by Gaussian basis terms, while locally each cluster represents a 
simple model such a* a linear regression function. Thus, previous results from linear system 
theory, linear time series analysis and other traditioaatf modern frameworks are applied within 
the broader context of a globally non-linear model 

Non-linear system modeling uses models with linear coefficients fi m and non-linear basis 



as in, for example, a polynomial expansion. Models may have the coefficients inside th£ iton- 
linearirities, 



as in, for example, a neural network. 

In the case of generalized linear model (1 ), only a single matrix pseudo-inverse is needed to find 
the. set of coefficients yielding the minimum mean-square error. However, the number of 
coefficients in (1) is exponential in the dimension of x Equation (2) 2 has more expressive 
power, which can reduce the number of coefficients needed for a given approximation error to 
linear in the dimension of x s however, the non-linear parameter** of (2) require an iterative 
search 3 , 

CWM uses simple local models, which satisfy equation (I), and \ise$ models as described, in- 
equation (2) to create global models, Hence, CWM combines the efficient estimation of the 
fbuner with the benefits of the latter. The local parameters are fitted by a S VD (Singular Values 
Decomposition) matrix inversion of the local covatiance matrix. The remaining cluster 
parameters in charge of thfe global weighting are found using a variant of the Expectation- 
Maximization (EM) Algorithm 4 EM is an iterative search tha* maximises the model likelihood 
given a data set and initial conditions. A initial starting values for the cluster parameters are 
picked randomly or according to the application, and then the- Expectation step starts, 



PAGE 16/26 * RCVDAT 11118/2003 3:10:10 PM [Eastern Standard Time] * $VR:USPT0-EFXRF-1/1 * DNIS:8729319 " CSID:650 969 6216 * DURATION (mm-ss):16-32 



CD 



(2) 




FROM r LRU OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12:22Pt1 Pi? 



o Expectation-Step; 

In the E-Step, the current cluster parameters are assumed correct and the posterior probabilities 
that relate each cluster to each data point are evaluated. Those probabilities can be interpreted as 
the probability that a particular cluster generated a particular data, or as the normalized 
responsibility of a cluster for a data point: 



where the sum over clusters in the denominator causes clusters to interact fight over points and 
specialize in data they best explain, 

d Maximizfititm-Step 

ft* the M-step, the current input-output data distribution is assumed correct, and the cluster 
parameters are found to maximize the likelihood of the data. The new estimate for the 
unconditioned cluster probabilities is, 



An integral over a density can be approximated by an average over variable drawn from that 
density. Next, the expected inpirt mean of each cluster is computed, vyhich is the estimate of the 
new cluster mean; 



J J J Pic a ) 

; Z^ x «p< c ». iy~**) « — — — 



(5) 



The introductions of output >> into (5) results that cluster parameters are found with respect to 
both input and output data. Clusters get pulled based on both where the data is to be explained 
and hov/ well their model explains the data. For a any function £(x), similar to (5), 

j,,,^^ tete <L -~*& W*> 

PAGE 17/26 * RCVD AT 11/1872003 3:10:10 PM [Eastern Standard Time] * SVR.USPTO-EFXRF-1/1 * DNJS:8729319 * CSID.650 969 6216 * DURATION (him.ssJ:16-32 




FROM :• LAW OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12:22PM P18 



which lead to the cluster weighted covariance matrices.. 



(6) 



(7) 



For updating the local model, the model parameters arefwnd by taking the derivative of the log 
of the total likelihood function with respect to the parameters, 



For a single output^ and a single coefficient 0^ 



(8) 



7t piy„x„) 



4 



y-f^M^jf^- 



(9) 



Put equation <1) into (9), the expression to update fi„ is obtained, 



(10) 




PAGE 18/26 * RCVD AT 1 1/18/2003 3:10:10 PM [Eastern Standard Time] ' SVR:USPT0-EFXRF-1/1 ' DNIS:8729319 « CSID: 



* DURATION (mm-ss):16-32 



FROM LAW OFFICES OF TERRY McHUGH 



PHONE NO. : 658 969 6216 



Nov. 18 2003 12:23PM P19 



For a whole set of model parameters, equation ( U) expands to, 

A.-B.'-A. (12) 

with. 

[b J, - {/A*M- /M &.)}„ Kl - {» • //(*>^)> w 

and finally, the output covariance matrices associated with each model are estimated by, 

q Snm&tarizv CWM Algorithm Model Estimation Process (E-M Iteration) 

I . Pick same initial conditions and initial cluster value; 
.2, Evaluate the probability of the data p{y, x\c m ); 

3, Find the posterior probability of the clusters pfc^ \y 7 x); 

4. Update 

CO the cluster weights p(c m ); 

(ii) the cluster-weighted expectations for the input means /C"; 
(Hi) variance or covariance ; 

(vi) the maximum likelihood model parameters and finally 
(v) the output variances arj£T 
5 Go back to 2» until the total data likelihood does not increase anymore. 



2. The CWftfl Algorithm in a Practical Media Identification 
Sensor 

(1) A Practical Media Identification Semor 

A practical media identification sensor allows a printer to determine the type of paper (i.e., "print 
medium") in the print zone or paper tray and to adjust the print engine parameters accordingly 
for optimal print quality. Furthermore^ identification of the presence of certain types of 
transparency film or special papers can be used to prevent damage to the print engiae, For 
example^ the coatings on some ink jet transparency films can melt on the fuser roller of a 
decrxophotograpruc (e.g.* Hf* "LaserJet") printer causing damage that requires the fuser toiler to 
be replaced. 

Easily observed using a microscope and grazing illumination (e.g., 45° to 75° from the surfece 
normal) is the surface texture of papers and some transparency fitaiis. This surface texiure has 



• *Hie Qiatri^ inversion should be done by SVD (Singular Value DeconipositiorO K* avoid possible nirmeriealproblem 
wiljK singular wvariancfc matrices ^ } 

PAGE 19/26 * RCVD AT 11/18/2003 3:10:10 PM (Eastern Standard Time] * SVR:USPT0-EFXRF-t/1 * DHIS:8729319 « CSID:650 969 6216 * DURATION (mm^s):16-32 



FROM r LRUI OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12:23PM P20 



feature* with characteristic sizes ranging between about 5 jam to about 100 urn, Each type of 
print medium has a characteristic surface texture, and thai provides the fundamental principle for 
media identification using analysis of microscopic surface features, 

An automatic device to discriminate among and thereby identify print media can be built using 
an image sensor employing a single pixel, a line of pixels, or a two-dimensional array of pixels. 
Depending upon the size of the sensor's pixel(s), optics image a specified area on the medium's 
surface onto the pixel. Typically, the viewed area of the print medium surface is a square about 
5 pm to about 100 pm on a side with 10-40 .urn giving practical results. 

Sur&ce texture can be characterized by a collection of measured gray-level values obtained by 
multiple samples over an imprinted area of tire print medium* s surface. Multiple samples may jbe 
obtained by scanning a single pixel sensor over the medium surface and taking measurements at 
different places (c£ patent applications by Steve Walker HP VCD) or with a linear or area array 
of pixels. An advantage of a line or area sensor over a single pixel sensor (e.g., a photodiode^ a 
phototransistor, or integrated " light-to- voltage" or "iSlgte-to-freqM.ency 1 * sensor) is multiple 
samples over a region of the print medium's surface may be obtained without requiring relative 
motion between the sensor and the prim medium. This is useful for simplifying the mechanism 
for identifying the. print medium in the input tray, where no motion source is generally available 
until the medium is fed into the paper path. Ahematively, a single pixel, line, or area sensor may 
accumulate multiple samples of the surface of the print medium as the print medium is ted from 
the input tray into the paper path, or at a point along the paper path where the medium passes 
under the (fixed) sensor, or by placing the sensor on a scanning print carriage where it moves 
across the stationary print medium. The objective of all these implementations is to accumulate 
multiple samples at different locations so as to evaluate variation in surface texture. In general, 
the objective is to improve the sampling statistics with more samples. 

The image sensor preferably has its optical axis along the normal to the plane of the print 
medium and captures an image of the surface illuminated by multiple wavelengths, for example 
those produced by green and blue LEDs, arranged to provide illumination at an incidence angle 
between about 75° and 45* from the surface normal These LEDs are illuminated : se<menjki3lly 
and pixel measurements are taken under each ilhuninant. More accurate identification involves 
the use of multiple illumination sources at different incidence angles 1 . Practically, this sample 
can be made over the identical physical region or over a different region. 

A. practical method uses one or more (LED) illuminants, a linear array of pixels sampling at a 
.single fixed location on the print medium surface, arid optics to image a region of -^um square 
onto each pixel. This arrangement has provided reliable result* without the necessity of moving 
the medium relative to the sensor or vice versa. 

The mean of the gray-level values of pixel data and their standard deviation are taken from 
images of microscopic surface features under illuminants with different wavelengths and angles 
of incidence. The mean value is the average reflectivity of the media and the standard deviation 
represents a measure of the texture roughness of the media. Trmning is required to establish a 



' A oevelopmcul version of the prototype media identification sensor used-four LED muminators; Green; Blue at a 
45* angle and Rc«L Infrared at a 75° angc with respect to the Sorfece aooitaL Titfs pro vided more experirneolal 
flexibility during development and more degrreSK>f4h»dora iti the sampled data. cT™ 



Jfe^ /j a A,/^^AA. Z/^m! 

PAGE 20/26 * RCVD AT 11/18/2003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 * DNIS:8729319 * CSID:650 969 6216 * DURATION (mm-ss):16-32 




FROM r LRW OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12:24PM P21 



LUT (look-up-table) for different media types/groups', and when an unknown media is sampled, 
the media ID sensor identifies the sample by finding its ctoset match to the reference set. 

(2) CWM Algorithm for n Practical Media Idetttiffe&tion Sensor 

A group of different media types for Laserjet or Inkjet printer can he characterized as a non- 
Ibea*; nott-Gaussian, complex niulti-dimension input-output system. The input of! the system is 
the n>ean/^tandard deviation (u/<y) pair computed from the. pixel; data sampled by the sensor from 
two different ulumiaants at specified angles of incidence. The output is- the best match of the 
unknown print medium to reference media types or media groups. 

One of the easier ways to achieve the goal of media identification is to take enough data samples 
from each media type and compute the means and standard deviations for each ijluminant at its 
angle of incidence' Then, the mean and standard deviation of the means and standard deviations 
for each media type is computed and stored in the LUT. When a. new set of (u/a) data from two 
illuminants are computed from an unknown media type, the distances; of the (u/cr) of the new 
data to those of the media types in the LUT are calculated, and the media, type/group is then 
determined by some function of these distances: the simplest solution h to find the minimum 
distance. This approach is similar to using the same mimbers of clusters as the numbers of media 
types in CWM algorithm. This approach provides satisfactory results only if the media data 
clouds are relatively symmetric and non-singular. Otherwise, the error of this approach can be 
big and further redaction of error is impossible. As shown in figure 1, which is the (uA?) data 
clouds for seven LaserJet media types, the assumption that each media type is symmetric and 
non-singular in (ji/o) domain is not accurate. In fact; rt is just the opposite. The data cloud is very 
asymmetric and singular. 

The CWM. framework provides an idea solution for this problem. In CWM, the input vector is 
defined for samples taken under green and bine illuminants as: 

and the output vector, in this case, a scaiar j> is one of the media types. In training, the set of 
vector pairs {v : *Xi used to traJn * e CWM input-output model using a simple local 
niodel> y^fi^ m the synthesis process, an unknown input vector Xj is applied to the 
predictor, which calculates p<j%*,) according to the trained CWM model to provide the 
probabilities of that input vector with respect to all media types in the training set The 
probabilities of the unknown media with respect to different media group can be further summed 
up by adding all probabilities for the media types that belong to the media group. 

The training process is both time-consuming and computational intensive, especially in the 
process oF^thering all different media samples, it takes several thousand input vectors for each 
media type°to provides a good estimate of the media distribution (i.e., "the data cloud"). It is 
computational intensive because of the required statistics calculations and matrix manipulations. 



■ Media Type ic&rs to an individual taserto or Inkjet media product, for example. HP Premium Glossy Photo 
Paper, I-EP Multipurpose Paper, Xerox JCpression Paper, eca Me&a Group refers to jauedia type* that have similar 
recording chaiacicrisiics and would use mxul&r print engine parameters such as drop volume, number of drops per 
pixel, etc. 

j^^>J? / / 

tf*~irft*i 




PAGE 21/26 * RCVDAT 11/1812003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 * DNIS:8729319 * CSID:650 969 6216 * DURATION (mm>ss):16-32 



FROM :* LAU OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12:25PM P22 



Fortunately, this process can be carried out off-line and only once for a}! media types/groups to 
be used for a particular printer. The training process is updated only when new media 
types/groups are introduced* or with changes in the optic/el eetronics design of the media sensor. 

Jt is practical to train a printer' to new media types if bidirectional communications exist between 
a printer and its host computer and appropriate software, is installed on the host, in this case^ the 
training for additional media types could occur dining a time When the primer is idle. The media 
identification sensor would provide the raw pixel data to the host for processing and association 
with the new media type sample, Software to accomplish this task could be conveniently 
downloaded from the Internet or be shipped with the printer as part of a printing solutions 
software application, 

iii any case^ alter training a small predictor (on the order of a few )00 lines of C code) and the 
ensemble of cluster parameters are all that is needed to implement an embedded media 
identification solution. This entire process could execute within the printer or on the host, fn 
this case, the printer resources must include sonie image processing capability to optimize the 
raster image data for rendering by a particular print algorithm (e^. ? for ink jet: dot levels per 
pixel, number of print carriage passes; for electrophotographic printers: feed rate, fuser 
temperature, etc.). In this case, partially -rendered image data is presented to the printer by the 
host and the printer completes the rendering process. This method is used in some HP ink jet 
printers (e.g., DeskJet 9S0 r 990, 2200) with the so-called "HP High Performance Architecture," 
In another method, the pixel measurements are uploaded to the host from the printer for 
processing. The raster image processing is done in the host. an<j a fttlly -rendered image i& sent to 
the printer. In principle, some combination of these processes could be used. 

The size of the cluster parameters is determined by the dimensions of input and output. 
Therefore, the LUT for CWM is determined by numbers of clusters used and the dimensions of 
input-output vector pair. The LUT should be relatively small - a few Kbytes. Therefore, the 
whole CWM implementation in printer of media sensor should have a footprint of several 
Kbytes which is extremely small by current memory standards. It also should be relatively fast 
therefore minimmng impact on throughput. 

(3) Preferred Cluster- Weighted Modeling Implementation on the Prototype Media Sensor 

1. Sensor Optic Focus Calibration: The optics are designed and focused to ensure that the 
pixel resolution of 8 ujat square on the medium surface with an optical blur circle of 
about 20 to 25 u,m can be achieved. 

2 Sensor Calibration: There are several noise sources in any image sensor and the data 
acquisition system, which must be eliminated or reduced as much as possible. The major 
source of noise are (1) sensor electronic noise (dark current), (2) sensor photon shot 
noise; (3) pixel-to-pixel variation; and (4) Ulnnaination non-uniformity caused by the 
source. The first two npise sources are random in nature and can only be effectively 
reduced by averaging. Their impact to the measurement is minor with the choice of 
adequate illumination levels. Sensor pixeUo-pixel noise is a fixed, high spatial frequency 
noise, and the illumination non~uniformity is fixed low spatial frequency effect These 
two noises are significant and must be addressed. The method co reduce these effects 
involves taking samples from imaging a white tile illuminated at sevgal intensity levels 



A 





PAGE 22/26* RCVD AT 1111812003 3:10:10 PM pastern Standard Time] * SVR:IISPT0-EFXRF-1/1 * DNIS:8729319 * CSID:650 969 6216* DURATION (mm-ss):16-32 



FROM LAW OFFICES OF TERRY McHUGH PHONE NO. : 650 969 6216 Nov. 18 2003 12:26PM P23 



The high- and low frequency effects are separated and a correction LUT (with values 
depending on average illumination) is applied to individual pixel outputs. 

3. Black Backing for the Measurement: A black tile is required to back up each sheet of 
print medium sample during measurement. This eliminate effects of light penetrating 
multiple sheets and provides a consistent and optimized sampling environment. It is 
important that the optical absorption characteristics of the tile used in training be identical 
to that used in the practical measurement. The black tile could be conveniently replaced 
with an opening leading into a uonreflective chamber, which should provide similar 
result. 

4. Training Set Generation: When teaming, an area sensor imager can be used to speed up 
the sampling process. The area imager seasl&r data is then sub^sampled to the same 
numbers of pixel as the final production line sensor. An input vector for blue & grean 
Uluminatkm, for example, \y^_w*&^>*a™*^* a *M is ibm com P u,?ed frotn the 
sub-sampled pixels. For each media type ; a few thousands of these input vectors are 
required to reconstruct reliably the data clouds of that particular media type*. Once all 
different media types are sampled, the input vectors are randomised to provide a better 
training set. 

5. Optimization of CWM Algorithm for Media Classification: 

a. Initial Location of the Clusters: Clusters should not be initialized arbitrarily since 
the algorithm only guarantees to terminate in a local likelihood maximum. The 
clusters should be placed as close to their final position as possible to save 
training time and get better convergence of data. The method of selecting initial 
cluster position is as follow: Choose 1/N as the initial cluster probabilities, where 
N is the numbers of the clusters. Kick randomly as many points from the training 
set as there are clusters and initialize the cluster input mean, as well as the cluster 
ourput mean with these points. Set the remaining output coefficients to zero. Use 
the size of the data set in each space dimension as the initial cluster variances, 

b. Normalization: It is required to normalize the training set to zero main and unit 
variance since arbitrary data values may cause probabilities to because too. small 

c. Optimize Numbers of Clusters wtd Numbers of EM Iteration : There is no rule as 
to how many clusters is optimal to a specific problem. Numbers of clusters 
should be larger than numbers of distinguishable output, in this case, numbers of 
media types. However, more clusters do not mean better discrimination. With too 
many small clusters, establishing membership may be difficult especially Tyhen a 
region is populated with many small clusters belonging to different media types. 
The same can be said for numbers of training iterations between EM steps when 
the number of cluster is constant. Therefore, an iterative search of increasing 
numbers of clusters from small to large and numbers of training iterations from 
small to larger for each particular numbers of clusters has to be performed and 
determined empirically. For example, with a sample of 7 LaserJet media* it was 
determined that 24 clusters and 23 iterations were optima}, and tliis provided the 



' Iu our testing, we took 4800 input vectors for each nwdia types; The area sensor we ased is 80x$0 pixel and the 
sampled data is then sub-sampfo&ju times to generate $0 SO^hx^Is data poStts> which in aim geijenited SO input 



PAGE 23126 ' RCVDAT 1111812003 3:10:10 PM [Eastern Standard Time] * SVR:IISPT0-EFXRF-1/1 ' DNIS:8729319 *CSID:650 969 6216* DURATION (mm-ss):16-32 





FROM :" LftUJ OFFICES OF TERRY McHUGH PHONE NO. : 650 969 6216 Nov. 18 2003 12: 27PM P24 



highest correct classification rate (refer to figure J). For 4 mkjet media, 6 dusters 
and 10 iterations provided the ultimate results of 100% correct r#edia 
identification (refer to figure 3} 

(4) Media Classification Using CWM Algorithm - Practical Results 

"There were two cases of media classification studied by HP Labs for LaserJet media And Inkjet 
media, ft was obvious that inkjet media. shows a mubh better separation than the LaserJet inedia 
(refer to figure 1 and 3). 

Out of 7 LaserJet media types, there are two media groups namely HP Multipurpose plain paper 
(less than 30 lb, except CoverStock paper, which is 1 101b) and high quality glossy paper (all over 
30 lb), The cluster distribution and the final training and testing results for this case, are shown in 
figure 1 and figure 2. It is obvious from figure 1 that, at least for the partial input vector 
C v AW tf* e media samples are all mixed and the data clouds have no clear boundary. 

The overlapping and mixing of different media types are not nearry as bad as shown in figure 1 
when the whole input vector (including the additional two dimensions for green illumination) is 
considered. But still, it is a difficult system for estimation and prediction. With 24 clusters tc> 
estimate 7 different media types, after 23 EM iterations, the clusters settled down to the data 
clouds .sho wo and it is clear that there are more than one cluster for each media types. 



input vector Clouds Of Sevan LaserJet Media Types & 
CWM Training Results (Numbers <rf Cluster^ 24/ Nuffibere of EM Iteration *> 23) 



o 

1 ' 
i 

jl o 

I 



-2 



A 1 : CovarSftK&O 1.GUJ) 

• S:X«ro^4Q24<20ft>) 
- 7: X£«yxXpf^$3tO0<32lfe) 




ffM^»fKrf ttw tHistKjr Sn fern d!m*ml<xt 



-2 -1 0 1 2 3 

Wom&ize* JVtew? 0*> of 7** Afedfe Sa#**p*« {Under Blue LED MumftTattort 



Input Dai* Clouds Computed From Seven LaserJet Media Samples 
& Ouster- Weighted Estimation Resttl^^-^ 



PAGE 24/26 * RCVDAT 1111812003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 * DNIS:8729319* CSID:650 969 6216* DURATION (mm-ss):16-32 ' 






FROM LRU OFFICES OF TERRY McHUGH 



PHONE NO. : 650 969 6216 



Nov. 18 2003 12:28PM P25 



Figure 2 shows the training and testing results for the 7 media types. The yeJJow rows arc the 
media types in the multipurpose plain paper group (Type* 1-4) and the gray rows are the three 
media types its high quality gkmy media group (Types 5-7). The rate for correctly identifying 
an individual sample ranged from 77% to 99%. There was very high accuracy of correctly 
identifying membership of a sample in either the multipurpose group or the glossy group. 

It is also possible to use CWM algorithm and the microscopic sensor to determine roughly the 
weight of the media in the printer tray. This is obtained by associating the weight of the print 
media sample with the surface texture characteristics by which it wiO be identified. This is an 
important information for LaserJet print engine since the printing mode and speed determined, 
ideally by the weight of the media. 



oil* .v/U 3X> 




XwAei f torwvit 



triwii Vw» 'WOT 
























t p^V < ^>.^^^-^^ *^£, _ 





CWM for Media Classification Testing Results 
for Seven LaserJet Media Types in Two Media Groups 



The 4 Inkjet media case is shown in figure 3 and is included to demonstrate the effect of media 
surface characteristics on cluster geometry. The result is that the identification problem is easier 
since there is less aliasing between clusters compared to the LaserJet media rested: the data 
clouds are significantly separated, even in this partial input vector space; and a much less 
numbers of cluster is required to fully estimate arid predict the membership of a sample among 
the 4 different types and groups of ink) t media studied. 





PAGE 25/26 * RCVD AT 11/18/2003 3:10:10 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/1 * DNIS:8729319 * CSID:650 969 6216 * DURATION (mm-ss):16-32 



FROM '/ LAU OFFICES OF TERRY McHUGH PHONE NO. : 650 969 6216 



Nov. 18 2003 12:29PM P26 



2- 



■8 

■g 

■§ 0.5 

I 
CO 

£ -0.5 



CWM Analysis on Four IVfedia Types 

* f Mumbew of Qt ffer toft T 



-1 



-t.5 



..J 



w 



-1 -0.5 0 

normalized Median (pi 



HP PUmturn 



JU 



0,5 



Input Data Clouds Fro** F<n*r Dif&reat L&kjet Media Types 
& CWM Estimation Results 



V R Sclioner. C. Cooper. C Douglas* N. Gershefifdd. i>«ft* Mv&t ntttde&ig and Syrrthesis ofAeatts&xd 

- Andrew Barton. Unkerml Approx*m#rUm I&mds Fdr&tpoipasitkf* &fA Sigmwdat Fxactiw. IEEE 

Transaction On Information Theory, 3^9:930-945, 1993 
3 Nfcff GcrshcafeW, Tfte 'M&k&natieat MtMing. Cambridge Universi^ Press. New York, W)9 

J A. P. Dempster, N. M. I^rtrd and D. B. Rubin. Maximum Kikeiihewi Fwm {ncomp?£U> <&im via the £M 
Algorithm. I R- Statist Sim;. B, 3fc WS, *977 




